Telling English Tweets Apart: the Case of US, GB, AU
نویسندگان
چکیده
In this paper, we study how to automatically tell different varieties of English apart on Twitter by taking samples from American (US), British (GB) and Australian (AU) English. We track cities and apply filters to generate ground-truth data. We perform expert evaluation to get a sense of the difficulty of the task. We then cast the problem as a classification task: given a tweet (or a set of tweets from a user) in English, the goal is to automatically identify whether the tweet (or set of tweets) is US, GB or AU English. We perform experiments to compare some linguistic features against simple statistical features and show that character Ngrams are quite effective for the task. Our work is closely related to socio-linguistics, especially research on diatopic varieties, linguistic landscapes, and World Englishes [5].
منابع مشابه
The Influence of Affective Variables on the Complexity, Accuracy, and Fluency in L2 Oral Production: The Contribution of Task Repetition
The main purpose of the study reported in this paper was to examine the interrelationships between L2 risk-taking, English learning motivation, L2 speaking anxiety, linguistic confidence, and low-proficiency English as a foreign language (EFL) learners’ speaking complexity, accuracy, and fluency (CAF). A secondary purpose was to test whether task repetition can influence the level of the mentio...
متن کاملTelling Apart Tweets Associated with Controversial versus Non-Controversial Topics
In this paper, we evaluate the predictability of tweets associated with controversial versus non-controversial topics. As a first step, we crowd-sourced the scoring of a predefined set of topics on a Likert scale from non-controversial to controversial. Our feature set entails and goes beyond sentiment features, e.g., by leveraging empathic language and other features that have been previously ...
متن کامل2016 Olympic Games on Twitter: Sentiment Analysis of Sports Fans Tweets using Big Data Framework
Big data analytics is one of the most important subjects in computer science. Today, due to the increasing expansion of Web technology, a large amount of data is available to researchers. Extracting information from these data is one of the requirements for many organizations and business centers. In recent years, the massive amount of Twitter's social networking data has become a platform for ...
متن کاملA one-dimensional model for variations of longitudinal wave velocity under different thermal conditions
Ultrasonic testing is a versatile and important nondestructive testing method. In many industrial applications, ultrasonic testing is carried out at relatively high temperatures. Since the ultrasonic w...
متن کاملEthicality of Narrative Inquiry as a Tool of Knowledge Production in Research
The sociocultural ways of conceptualizing human learning in general education have given rise to various sensitive and time-consuming tools of knowledge production for both the researcher and the researched. The gravity of the situation is more noticeable in narrative inquiry methodology, which has gathered momentum in both general education and second language teacher education (SLTE) because ...
متن کامل